In this project, you'll define and train a Generative Adverserial network of your own creation on a dataset of faces. Your goal is to get a generator network to generate new images of faces that look as realistic as possible!
The project will be broken down into a series of tasks from defining new architectures training adversarial networks. At the end of the notebook, you'll be able to visualize the results of your trained Generator to see how it performs; your generated samples should look like fairly realistic faces with small amounts of noise.
You'll be using the CelebFaces Attributes Dataset (CelebA) to train your adversarial networks.
This dataset has higher resolution images than datasets you have previously worked with (like MNIST or SVHN) you've been working with, and so, you should prepare to define deeper networks and train them for a longer time to get good results. It is suggested that you utilize a GPU for training.
Since the project's main focus is on building the GANs, we've done some of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. Some sample data is show below.

If you are working locally, you can download this data by clicking here
This is a zip file that you'll need to extract in the home directory of this notebook for further loading and processing. After extracting the data, you should be left with a directory of data processed-celeba-small/.
# run this once to unzip the file
!unzip processed-celeba-small.zip
from glob import glob
from typing import Tuple, Callable, Dict
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import os
from PIL import Image
from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import Compose, ToTensor, Resize, Lambda, RandomHorizontalFlip, InterpolationMode
import tests
data_dir = 'processed_celeba_small/celeba/'
The CelebA dataset contains over 200,000 celebrity images with annotations. Since you're going to be generating faces, you won't need the annotations, you'll only need the images. Note that these are color images with 3 color channels (RGB)#RGB_Images) each.
Since the project's main focus is on building the GANs, we've done some of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. This pre-processed dataset is a smaller subset of the very large CelebA dataset and contains roughly 30,000 images.
Your first task consists in building the dataloader. To do so, you need to do the following:
The get_transforms function should output a torchvision.transforms.Compose of different transformations. You have two constraints:
def get_transforms(size: Tuple[int, int]) -> Callable:
""" Transforms to apply to the image."""
# TODO: edit this function by appening transforms to the below list
transforms = [ToTensor(),
Resize(size, interpolation=InterpolationMode.BICUBIC),
Lambda(lambda x: x * 2.0 - 1.0),
RandomHorizontalFlip(p=0.5)]
return Compose(transforms)
The DatasetDirectory class is a torch Dataset that reads from the above data directory. The __getitem__ method should output a transformed tensor and the __len__ method should output the number of files in our dataset. You can look at this custom dataset for ideas.
class DatasetDirectory(Dataset):
"""
A custom dataset class that loads images from folder.
args:
- directory: location of the images
- transform: transform function to apply to the images
- extension: file format
"""
def __init__(self,
directory: str,
transforms: Callable = None,
extension: str = '.jpg'):
self.dir = directory
self.transform = transforms
self.extension = extension
self.data = os.listdir(data_dir)
for filename in self.data:
if not (filename.endswith(self.extension)):
self.data.remove(filename)
def __len__(self) -> int:
""" returns the number of items in the dataset """
return len(self.data)
def __getitem__(self, index: int) -> torch.Tensor:
""" load an image and apply transformation """
filename = self.data[index]
img = Image.open(os.path.join(self.dir, filename)).convert('RGB')
if self.transform != None:
img = self.transform(img)
return img
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to verify your dataset implementation
dataset = DatasetDirectory(data_dir, get_transforms((64, 64)))
tests.check_dataset_outputs(dataset)
The functions below will help you visualize images from the dataset.
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
def denormalize(images):
"""Transform images from [-1.0, 1.0] to [0, 255] and cast them to uint8."""
return ((images + 1.) / 2. * 255).astype(np.uint8)
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(20, 4))
plot_size=20
for idx in np.arange(plot_size):
ax = fig.add_subplot(2, int(plot_size/2), idx+1, xticks=[], yticks=[])
img = dataset[idx].numpy()
img = np.transpose(img, (1, 2, 0))
img = denormalize(img)
ax.imshow(img)
As you know, a GAN is comprised of two adversarial networks, a discriminator and a generator. Now that we have a working data pipeline, we need to implement the discriminator and the generator.
Feel free to implement any additional class or function.
The discriminator's job is to score real and fake images. You have two constraints here:
Feel free to get inspiration from the different architectures we talked about in the course, such as DCGAN, WGAN-GP or DRAGAN.
Conv2d layers with the correct hyperparameters or Pooling layers.from torch.nn import Module
class Discriminator(Module):
def __init__(self):
super(Discriminator, self).__init__()
self.conv1 = nn.Conv2d(3, 64, 4, stride=2, padding=1, bias=False)
self.conv2 = nn.Conv2d(64, 128, 4, stride=2, padding=1, bias=False)
self.conv3 = nn.Conv2d(128, 256, 4, stride=2, padding=1, bias=False)
self.conv4 = nn.Conv2d(256, 512, 4, stride=2, padding=1, bias=False)
self.flatten = nn.Flatten()
self.activation = nn.LeakyReLU(0.2)
self.norm1 = nn.BatchNorm2d(128)
self.norm2 = nn.BatchNorm2d(256)
self.norm3 = nn.BatchNorm2d(512)
self.fc1 = nn.Linear(512*4*4, 512)
self.fc2 = nn.Linear(512, 256)
self.fc3 = nn.Linear(256, 128)
self.fc4 = nn.Linear(128, 1)
self.relu = nn.ReLU()
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.activation(self.conv1(x))
x = self.activation(self.norm1(self.conv2(x)))
x = self.activation(self.norm2(self.conv3(x)))
x = self.activation(self.norm3(self.conv4(x)))
x = self.flatten(x)
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.relu(self.fc3(x))
x = self.fc4(x)
return x
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to check your discriminator implementation
discriminator = Discriminator()
tests.check_discriminator(discriminator)
The generator's job creates the "fake images" and learns the dataset distribution. You have three constraints here:
[batch_size, latent_dimension, 1, 1]Feel free to get inspiration from the different architectures we talked about in the course, such as DCGAN, WGAN-GP or DRAGAN.
ConvTranspose2d layersclass Generator(Module):
def __init__(self, latent_dim: int, dropout_prob=0.2):
super(Generator, self).__init__()
self.latent_dim = latent_dim
self.dropout_prob = dropout_prob
self.fc1 = nn.Linear(self.latent_dim, 4*4*512)
self.conv1 = nn.ConvTranspose2d(512, 256, 2, 2, 0)
self.conv2 = nn.ConvTranspose2d(256, 128, 2, 2, 0)
self.conv3 = nn.ConvTranspose2d(128, 64, 2, 2, 0)
self.conv4 = nn.ConvTranspose2d(64, 3, 2, 2, 0)
self.norm1 = nn.BatchNorm2d(256)
self.norm2 = nn.BatchNorm2d(128)
self.norm3 = nn.BatchNorm2d(64)
self.relu = nn.ReLU()
self.tanh = nn.Tanh()
self.flatten = nn.Flatten()
self.dropout2d = nn.Dropout2d(self.dropout_prob)
self.dropout = nn.Dropout(self.dropout_prob)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.flatten(x)
x = self.dropout(self.fc1(x))
x = x.view(-1, 512, 4, 4)
x = self.relu(self.norm1(self.conv1(x)))
x = self.relu(self.norm2(self.conv2(x)))
x = self.relu(self.norm3(self.conv3(x)))
x = self.tanh(self.conv4(x))
return x
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to verify your generator implementation
latent_dim = 128
generator = Generator(latent_dim)
tests.check_generator(generator, latent_dim)
In the following section, we create the optimizers for the generator and discriminator. You may want to experiment with different optimizers, learning rates and other hyperparameters as they tend to impact the output quality.
import torch.optim as optim
lr = 0.001
beta1 = 0.5
beta2 = 0.999
def create_optimizers(generator: Module, discriminator: Module):
""" This function should return the optimizers of the generator and the discriminator """
# TODO: implement the generator and discriminator optimizers
g_optimizer = optim.Adam(generator.parameters(), lr, [beta1, beta2])
d_optimizer = optim.Adam(discriminator.parameters(), lr, [beta1, beta2])
return g_optimizer, d_optimizer
In this section, we are going to implement the loss function for the generator and the discriminator. You can and should experiment with different loss function.
Some tips:
The generator's goal is to get the discriminator to think its generated images (= "fake" images) are real.
def generator_loss(fake_logits):
""" Generator loss, takes the fake scores as inputs. """
criterion = nn.BCEWithLogitsLoss()
labels = torch.ones_like(fake_logits).to(device) * 0.98
loss = criterion(fake_logits, labels)
return loss
We want the discriminator to give high scores to real images and low scores to fake ones and the discriminator loss should reflect that.
def discriminator_loss(real_logits, fake_logits):
""" Discriminator loss, takes the fake and real logits as inputs. """
criterion = nn.BCEWithLogitsLoss()
real_labels = torch.ones_like(real_logits).to(device) * 0.98
fake_labels = torch.zeros_like(fake_logits).to(device)
real_loss = criterion(real_logits, real_labels)
real_loss.backward()
fake_loss = criterion(fake_logits, fake_labels)
fake_loss.backward()
loss = real_loss + fake_loss
return loss
In the course, we discussed the importance of gradient penalty in training certain types of Gans. Implementing this function is not required and depends on some of the design decision you made (discriminator architecture, loss functions).
def gradient_penalty(discriminator, real_samples, fake_samples):
""" This function enforces """
gp = 0
# TODO (Optional): implement the gradient penalty
return gp
Training will involve alternating between training the discriminator and the generator. You'll use your functions real_loss and fake_loss to help you calculate the discriminator losses.
Each function should do the following:
def generator_step(batch_size: int, latent_dim: int) -> Dict:
""" One training step of the generator. """
g_optimizer.zero_grad()
z = np.random.uniform(-1, 1, size=(batch_size, latent_dim, 1, 1))
vector = torch.from_numpy(z).float().to(device)
fake_image = generator(vector)
logits = discriminator(fake_image)
g_loss = generator_loss(logits)
g_loss.backward()
g_optimizer.step()
return {'loss': g_loss}
def discriminator_step(batch_size: int, latent_dim: int, real_images: torch.Tensor) -> Dict:
""" One training step of the discriminator. """
d_optimizer.zero_grad()
z = np.random.uniform(-1, 1, size=(batch_size, latent_dim, 1, 1))
vector = torch.from_numpy(z).float().to(device)
fake_image = generator(vector)
fake_logits = discriminator(fake_image)
real_logits = discriminator(real_images)
d_loss = discriminator_loss(real_logits, fake_logits)
d_optimizer.step()
return {'loss': d_loss, 'gp': 0}
You don't have to implement anything here but you can experiment with different hyperparameters.
from datetime import datetime
# you can experiment with different dimensions of latent spaces
latent_dim = 128
# update to cpu if you do not have access to a gpu
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# number of epochs to train your model
n_epochs = 20
# number of images in each batch
batch_size = 64
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
print_every = 1
# Create optimizers for the discriminator D and generator G
generator = Generator(latent_dim).to(device)
discriminator = Discriminator().to(device)
g_optimizer, d_optimizer = create_optimizers(generator, discriminator)
dataloader = DataLoader(dataset,
batch_size=64,
shuffle=True,
num_workers=4,
drop_last=True,
pin_memory=False)
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
def display(fixed_latent_vector: torch.Tensor):
""" helper function to display images during training """
fig = plt.figure(figsize=(14, 4))
plot_size = 16
for idx in np.arange(plot_size):
ax = fig.add_subplot(2, int(plot_size/2), idx+1, xticks=[], yticks=[])
img = fixed_latent_vector[idx, ...].detach().cpu().numpy()
img = np.transpose(img, (1, 2, 0))
img = denormalize(img)
ax.imshow(img)
plt.show()
You should experiment with different training strategies. For example:
Implement with your training strategy below.
for batch_i, real_images in enumerate(dataloader):
print(real_images.shape)
display(real_images)
break
z = np.random.uniform(-1, 1, size=(16, latent_dim, 1, 1))
fixed_latent_vector = torch.from_numpy(z).float().to(device)
losses = []
for epoch in range(n_epochs):
generator.train()
discriminator.train()
for batch_i, real_images in enumerate(dataloader):
real_images = real_images.to(device)
if epoch + 1 <= n_epochs/2:
for i in range(3):
d_loss = discriminator_step(batch_size, latent_dim, real_images)
g_loss = generator_step(batch_size, latent_dim)
d_loss = discriminator_step(batch_size, latent_dim, real_images)
else:
for i in range(3):
g_loss = generator_step(batch_size, latent_dim)
d_loss = discriminator_step(batch_size, latent_dim, real_images)
g_loss = generator_step(batch_size, latent_dim)
if (epoch + 1 % 5) == 0:
lr = lr / 10.0
g_optimizer, d_optimizer = create_optimizers(generator, discriminator)
d = d_loss['loss'].item()
g = g_loss['loss'].item()
losses.append((d, g))
time = str(datetime.now()).split('.')[0]
print(f'{time} | Epoch [{epoch+1}/{n_epochs}] | d_loss: {d:.4f} | g_loss: {g:.4f}')
# display images during training
generator.eval()
generated_images = generator(fixed_latent_vector)
display(generated_images)
generator.train()
Plot the training losses for the generator and discriminator.
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
fig, ax = plt.subplots()
losses = np.array(losses)
plt.plot(losses.T[0], label='Discriminator', alpha=0.5)
plt.plot(losses.T[1], label='Generator', alpha=0.5)
plt.title("Training Losses")
plt.legend()
When you answer this question, consider the following factors:
Answer: The model is good at generating samples that resemble faces. It is capturing important features that distinguish one face from another, and it is good at creating a general representation of a face. However, it does struggle to capture the finer details that would make it indistinguishable from a real image. To improve this, I would make a much larger model with more features for both the generator and discriminator. Then I would train it for more epochs with a smaller learning rate.
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_face_generation.ipynb".
Submit the notebook using the SUBMIT button in the bottom right corner of the Project Workspace.